Homography estimation is erroneous in the case of large-baseline due to the low image overlay and limited receptive field. To address it, we propose a progressive estimation strategy by converting large-baseline homography into multiple intermediate ones, cumulatively multiplying these intermediate items can reconstruct the initial homography. Meanwhile, a semi-supervised homography identity loss, which consists of two components: a supervised objective and an unsupervised objective, is introduced. The first supervised loss is acting to optimize intermediate homographies, while the second unsupervised one helps to estimate a large-baseline homography without photometric losses. To validate our method, we propose a large-scale dataset that covers regular and challenging scenes. Experiments show that our method achieves state-of-the-art performance in large-baseline scenes while keeping competitive performance in small-baseline scenes. Code and dataset are available at https://github.com/megvii-research/LBHomo.
translated by 谷歌翻译
The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi.
translated by 谷歌翻译
动机:癌症是异质的,影响了个性化治疗的精确方法。准确的亚型可以导致癌症患者的生存率更好。高通量技术为癌症亚型提供了多个OMIC数据。但是,由于OMICS数据的大量和高维度,精确的癌症亚型仍然具有挑战性。结果:这项研究提出了基于MLP和变压器块的深度学习方法拟议的亚型形式,以提取多摩学数据的低维表示。 K-均值和共识聚类也用于获得准确的亚型结果。我们比较了TCGA 10癌症类型的其他最先进的亚型方法。我们发现,基于生存分析,亚型形式可以在5000多个肿瘤的基准数据集上表现更好。此外,亚型形式还取得了泛滥亚型的出色结果,这可以帮助分析分子水平上各种癌症类型的共同点和差异。最后,我们将亚型格式应用于TCGA 10类型的癌症。我们确定了50种基本生物标志物,可用于研究靶向癌症药物并促进精密医学时代的癌症治疗。
translated by 谷歌翻译
眼底摄影是诊断和监测眼部疾病的诊所的常规检查。但是,对于白内障患者,底眼图像始终会遭受由云晶状体引起的质量降解。降解阻止了眼科医生或计算机辅助系统可靠的诊断。为了提高临床诊断的确定性,已经提出了恢复算法来提高眼底图像的质量。不幸的是,这些算法的部署仍然存在挑战,例如收集足够的培训数据和保存视网膜结构。在本文中,为了规避严格的部署要求,从共享相同结构的合成数据中开发出了针对白内障底底图像的结构一致的恢复网络(SCR-NET)。白内障仿真模型首先是设计用于收集由白内障底面图像共享相同结构的合成性白内障集(SC)的。然后从SCS中提取高频组件(HFC)以约束结构一致性,从而强制执行SCR-NET中的结构保留。该实验证明了SCR-NET与最新方法和后续临床应用的比较中的有效性。该代码可从https://github.com/liamheng/arcnet-medical-image-enhancement获得。
translated by 谷歌翻译
基于方面的情绪分析(ABSA)任务由三个典型的子特点组成:术语术语提取,意见术语提取和情感极性分类。这三个子组织通常是共同执行的,以节省资源并减少管道中的错误传播。但是,大多数现有联合模型只关注编码器共享的福利在子任务之间共享,但忽略差异。因此,我们提出了一个关节ABSA模型,它不仅享有编码器共享的好处,而且还专注于提高模型效率的差异。详细地,我们介绍了双编码器设计,其中一对编码器特别侧重于候选方识对分类,并且原始编码器对序列标记进行注意。经验结果表明,我们的拟议模型显示了鲁棒性,并显着优于前一个基准数据集的先前最先进。
translated by 谷歌翻译
在弱照明条件下捕获的图像可能会严重降低图像质量。求解一系列低光图像的降解可以有效地提高图像的视觉质量和高级视觉任务的性能。在本研究中,提出了一种新的基于RETINEX的实际网络(R2RNET),用于低光图像增强,其包括三个子网:DECOM-NET,DENOISE-NET和RELIGHT-NET。这三个子网分别用于分解,去噪,对比增强和细节保存。我们的R2RNET不仅使用图像的空间信息来提高对比度,还使用频率信息来保留细节。因此,我们的模型对所有退化的图像进行了更强大的结果。与在合成图像上培训的最先前的方法不同,我们收集了第一个大型现实世界配对的低/普通灯图像数据集(LSRW数据集),以满足培训要求,使我们的模型具有更好的现实世界中的泛化性能场景。对公共数据集的广泛实验表明,我们的方法在定量和视觉上以现有的最先进方法优于现有的现有方法。此外,我们的结果表明,通过使用我们在低光条件下的方法获得的增强的结果,可以有效地改善高级视觉任务(即面部检测)的性能。我们的代码和LSRW数据集可用于:https://github.com/abcdef2000/r2rnet。
translated by 谷歌翻译
本文回顾了关于压缩视频质量增强质量的第一个NTIRE挑战,重点是拟议的方法和结果。在此挑战中,采用了新的大型不同视频(LDV)数据集。挑战有三个曲目。Track 1和2的目标是增强HEVC在固定QP上压缩的视频,而Track 3旨在增强X265压缩的视频,以固定的位速率压缩。此外,轨道1和3的质量提高了提高保真度(PSNR)的目标,以及提高感知质量的2个目标。这三个曲目完全吸引了482个注册。在测试阶段,分别提交了12个团队,8支球队和11支球队,分别提交了轨道1、2和3的最终结果。拟议的方法和解决方案衡量视频质量增强的最先进。挑战的首页:https://github.com/renyang-home/ntire21_venh
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译
We aim to bridge the gap between our common-sense few-sample human learning and large-data machine learning. We derive a theory of human-like few-shot learning from von-Neuman-Landauer's principle. modelling human learning is difficult as how people learn varies from one to another. Under commonly accepted definitions, we prove that all human or animal few-shot learning, and major models including Free Energy Principle and Bayesian Program Learning that model such learning, approximate our theory, under Church-Turing thesis. We find that deep generative model like variational autoencoder (VAE) can be used to approximate our theory and perform significantly better than baseline models including deep neural networks, for image recognition, low resource language processing, and character recognition.
translated by 谷歌翻译
Despite significant progress in object categorization, in recent years, a number of important challenges remain; mainly, the ability to learn from limited labeled data and to recognize object classes within large, potentially open, set of labels. Zero-shot learning is one way of addressing these challenges, but it has only been shown to work with limited sized class vocabularies and typically requires separation between supervised and unsupervised classes, allowing former to inform the latter but not vice versa. We propose the notion of vocabulary-informed learning to alleviate the above mentioned challenges and address problems of supervised, zero-shot, generalized zero-shot and open set recognition using a unified framework. Specifically, we propose a weighted maximum margin framework for semantic manifold-based recognition that incorporates distance constraints from (both supervised and unsupervised) vocabulary atoms. Distance constraints ensure that labeled samples are projected closer to their correct prototypes, in the embedding space, than to others. We illustrate that resulting model shows improvements in supervised, zero-shot, generalized zero-shot, and large open set recognition, with up to 310K class vocabulary on Animal with Attributes and ImageNet datasets.
translated by 谷歌翻译